Introduction to Database Reverse Engineering
نویسنده
چکیده
Database reverse engineering mainly deals with schema extraction, analysis and transformation. In the same way as for any other database engineering process, it must rely on a rich set of models. These models must be able to describe data structures at different levels of abstraction, ranging from physical to conceptual, and according to various modeling paradigms. In addition, statically describing data structures is insufficient. We must be able to describe how a schema has evolved into another one. For instance, a physical schema leads to a logical schema, which in turn is translated into a conceptual schema. These transitions, which form the basis of DBRE, can be explained in a systematic way through the concept of schema transformation. In this section, we describe a data structure model with which schemas at different abstraction levels and according to the most common paradigms can be described precisely. Then, we present schema transformation operators that can explain inter-schema transitions. 2-2 2 • Data schema specification J-L Hainaut 2002 12/5/2002 2.1 A wide-spectrum specification model In database development methodologies, the complexity is broken down by considering three abstraction levels. The engineering requirements are distributed among these levels, ranging from correctness to efficiency. At the conceptual level, the designer produces a technologyindependent specification of the information, expressed as a conceptual schema, relying on an ad hoc formalism, called a conceptual model. At the logical level, the information is expressed in a model for which a technology exists. For instance, the required information is organized into a relational or object-oriented logical schema. Since reverse engineering is concerned with legacy systems, we will also consider CODASYL DBTG, IMS, TOTAL/IMAGE, COBOL, BASIC, RPG or ADABAS logical schemas. While a logical schema is based on a family of DMS models, a physical schema is dedicated to a specific DMS. In addition to the logical constructs, it includes technical specifications that govern data storage, access mechanisms, concurrency protocols or recovery parameters. We will talk about network logical schemas, but about, say, IDMS physical schemas. Due to the scope of this presentation, we cannot adopt specific formalisms for each of the abstraction levels. Instead, we will base the discussion on generic models that can easily be specialized into specific models. For instance, this model hierarchy can be translated into OMT/ relational/ORACLE 8, into ERA/CODASYL/IDMS or into ORM/OO/O2 design methodologies. These models are derived from a unique model the formal basis of which has been developed in [Hainaut 1989]. Conceptual specifications A conceptual schema mainly specifies entity types (or object classes), relationship types and attributes. Entity types can be organized into ISA hierarchies (organizing supertypes and subtypes), with total and disjoint properties. Attributes can be atomic or compound. They are characterized with a cardinality constraint [i-j] stating how many values can be associated with a parent instance (default is [1-1]). A relationship type has 2 or more roles. Each role also has a cardinality constraint [i-j] that states in how many relationships an entity will appear with this role. Entity types and relationship types can have identifiers, made up of attributes and/or remote roles. The source value set of an attribute can be a basic domain (e.g., numeric, boolean, character, time), a user-defined domain (e.g., VAT_number, Address, Pers_ID, URL) or an object class (in some OO models). Some of these constructs are illustrated in Figure 2-1. In some situations, we will need more complex constraints, such as existence constraints, which state that a set B of components (attributes or remote roles) can be constrained by coexistence (either all elements of B must have a value or none), exclusive (at most one element of B can have a value) and at-least-one (at least one element of B must have a value) relations. The exact terms used to denote these constructs will vary according to the specific model chosen. For instance, entity types will be called object classes in OMT or NOLOT in NIAM-like formalisms. 2.1 A wide-spectrum specification model 2-3 12/5/2002 J-L Hainaut 2002 Figure 2-1: Conceptual structures. Logical specifications A logical schema comprises data structure definitions according to one of the commonly used families of models: relational model, network model (CODASYL DBTG), hierarchical model (IMS), shallow model (TOTAL, IMAGE), inverted file model (DATACOM/DB), standard file model (COBOL, C, RPG, BASIC) or object-oriented model. For instance, the schema of Figure 2-2 can be considered as a possible relational translation the conceptual schema of Figure 2-1. Similarly, one could have designed a network, hierarchical or OO equivalent schema. Since we want to discuss reverse engineering problems independently of the DMS model, we will use general terms such as record type, inter-record relationship type and field. For a specific model, these terms will translate into the specific terminology of this model. For instance, record types will be called table in relational schemas, segment types in hierarchical schemas, and data sets in shallow schemas. An inter-record relationship will be read set type in the network model, access path in the shallow model and parent-child relationship in the hierarchical model. New constructs appear at this level, such as special kinds of multivalued attributes (bag, list, This schema includes entity types (or object classes) PERSON, CUSTOMER, SUPPLIER, ORDER and PRODUCT. PERSON has two disjoint subtypes, CUSTOMER and SUPPLIER. Relationship type from is binary while detail is ternary. Each ORDER entity appears in exactly 1 from relationship (cardinality [1-1]), and in at least one detail relationships (cardinality [1-N]). Entity types and relationship types can be given attributes. For entity type PERSON, attribute Name is atomic, single-valued and mandatory. Address is a compound attribute. Its component Num is atomic, single-valued and optional (cardinality [0-1]). Phone is multivalued and optional: there are from 0 to 5 values per entity. {PID} is the identifier of PERSON. The identifier of ORDER is made of external entity type from.CUSTOMER and of local attribute ONum. There cannot exist more than one detail relationship with the same ORDER and PRODUCT entites. 1-1 0-N
منابع مشابه
Database Reverse Engineering
Database reverse engineering consists of recovering the abstract descriptions of files and databases of legacy information systems. A legacy information system can be defined as a “data-intensive application, such as [a] business system based on hundreds or thousands of data files (or tables), that significantly resists modifications and changes” (Brodie & Stonebraker, 1995). The objective of d...
متن کاملInference of Aggregate Relationships through Database Reverse Engineering
This paper presents a process to improve the reverse engineering of relational databases. Our process extracts the current aggregate relationships from a relational database through a combination of data dictionary, data schema and data instance analysis. The process we propose can refine conceptual diagrams of commercial tools with reverse engineering options as Power AMC (Sybase), Designer (O...
متن کاملTransformation-Based Database Engineering
In this chapter, we develop a transformational framework in which many database engineering processes can be modeled in a precise way, and in which properties such as semantics preservation and propagation can be studied rigorously. Indeed, the transformational paradigm is particularly suited to database schema manipulation and translation, that are the basis of such processes as schema normali...
متن کاملReverse Engineering of OO constructs in Object-Relational Database Schemas
Reverse engineering applied to databases permits to extract a conceptual schema that represents, at a higher level of abstraction, the database implementation. This resulting conceptual schema may be used to facilitate, among others, system maintenance, evolution and reuse. In the last years, the use of object-relational constructs was incorporated into database development. However, reverse en...
متن کاملInformal and Intelligent Acquisition of Semantic Constraints in Database Design and Reverse Engineering 1
The main objective of database modelling is the design of a database that is correct and can be processed eeciently by a database management system. The eeciency and correctness of a database depends among other things on knowledge about database semantics because semantic constraints are the prerequisite for normal-isation and restructuring operations. Acquisition of semantic constraints remai...
متن کاملDomain-retargetable reverse engineering. III. Layered modeling
This paper describes ongoing work on a domainretargetable reverse engineering environment which is used to aid the structural understanding of large information spaces. In particular, it presents a layered modeling approach to representing three classes of artifacts manipulated during the reverse engineering process. The approach provides a practical and extensible method of integrating existin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002